DanfeNER - Named Entity Recognition in Nepali Tweets

نویسندگان

چکیده

Twitter allows users to easily post tweets on any subject or event anytime, generating massive amounts of rich text content diverse topics. Automated methods such as Named Entity Recognition (NER) are required process the tweet data. Processing tweets, however, poses a special challenge they informal posts with incomplete context and often contain acronyms, hashtags, misspellings, abbreviations, URLs due length constraints. This paper presents first systematic study NER in Nepali corresponding five different entity types: Person Name (PER), Location (LOC), Organization (ORG), Date (DAT), Event (EVT). We develop DanfeNER, human-labeled high-quality benchmark data sets for low-resource language Nepali. DanfeNER contains 5,366 records 3,463 entities its train set 2,301 1,503 test set. Using this set, we several state-of-the-art monolingual multilingual transformer models, obtaining micro-averaged F1 scores up 81%.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Named Entity Recognition from Tweets

Entries in microblogging sites are very short. For example, a ‘tweet’ (a post or status update on the popular microblogging site Twitter) can contain at most 140 characters. To comply with this restriction, users frequently use abbreviations to express their thoughts, thus producing sentences that are often poorly structured or ungrammatical. As a result, it becomes a challenge to come up with ...

متن کامل

Memory-based Named Entity Recognition in Tweets

We present a memory-based named entity recognition system that participated in the MSM-2013 Concept Extraction Challenge. The system expands the training set of annotated tweets with part-ofspeech tags and seedlist information, and then generates a sequential memory-based tagger comprised of separate modules for known and unknown words. Two taggers are trained: one on the original capitalized d...

متن کامل

Named Entity Recognition on Turkish Tweets

Various recent studies show that the performance of named entity recognition (NER) systems developed for well-formed text types drops significantly when applied to tweets. The only existing study for the highly inflected agglutinative language Turkish reports a drop in FMeasure from 91% to 19% when ported from news articles to tweets. In this study, we present a new named entity-annotated tweet...

متن کامل

Named Entity Recognition and Disambiguation in Tweets Master Thesis

Social media has grown exponentially over the past few years. Users are generating far more unstructured content than ever before. Successful companies are also very active in social media analysing these data for their marketing campaigns. But the informal and noisy nature of such data makes it quite difficult to extract meaningful information out of them. In this thesis, we investigate the pr...

متن کامل

Joint Named Entity Recognition and Stance Detection in Tweets

Named entity recognition (NER) is a well-established task of information extraction which has been studied for decades. More recently, studies reporting NER experiments on social media texts have emerged. On the other hand, stance detection is a considerably new research topic usually considered within the scope of sentiment analysis. Stance detection studies are mostly applied to texts of onli...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... International Florida Artificial Intelligence Research Society Conference

سال: 2023

ISSN: ['2334-0762', '2334-0754']

DOI: https://doi.org/10.32473/flairs.36.133384